Skip to content

fix(docs-mcp): recursively crawl and register nested llms.txt resources#2317

Merged
colinaaa merged 3 commits into
mainfrom
fix/docs-mcp-recursion
May 12, 2026
Merged

fix(docs-mcp): recursively crawl and register nested llms.txt resources#2317
colinaaa merged 3 commits into
mainfrom
fix/docs-mcp-recursion

Conversation

@hzy
Copy link
Copy Markdown
Collaborator

@hzy hzy commented Mar 6, 2026

This PR fixes an issue where nested documentation resources (linked from sub-indexes like api/llms.txt) were not being registered by the MCP server, causing 'Resource not found' errors.

Changes:

  • Implemented recursive crawling of llms.txt files in main.ts.
  • Added logic to fetch and parse nested index files and register their linked resources.
  • Added HTTP status checks and improved error handling.
  • Added changeset.

Summary by CodeRabbit

  • Bug Fixes

    • Adds recursive crawling and registration of nested documentation indices so deeper docs are discovered.
    • Prevents duplicate processing of already-seen documentation sources.
    • Improves fetch error handling with clearer logging and continues processing remaining resources on failures.
  • Chores

    • Added a changeset entry to mark a patch release.

Copilot AI review requested due to automatic review settings March 6, 2026 16:20
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Mar 6, 2026

🦋 Changeset detected

Latest commit: 7c2e3a9

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@lynx-js/docs-mcp-server Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@cla-assistant
Copy link
Copy Markdown

cla-assistant Bot commented Mar 6, 2026

CLA assistant check
All committers have signed the CLA.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 6, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1268a700-3178-4e0d-935c-1717170da73c

📥 Commits

Reviewing files that changed from the base of the PR and between 6877936 and 7c2e3a9.

📒 Files selected for processing (1)
  • .changeset/fix-recursive-docs-mcp.md
✅ Files skipped from review due to trivial changes (1)
  • .changeset/fix-recursive-docs-mcp.md

📝 Walkthrough

Walkthrough

Converts the docs MCP server registrar into an async crawler that recursively discovers and registers nested llms.txt markdown indexes, tracks visited URLs to avoid cycles, switches traversal to for...of for awaitable control flow, adds guarded fetch/error logging, and adds a changeset entry.

Changes

Changeset Entry

Layer / File(s) Summary
Changeset metadata
.changeset/fix-recursive-docs-mcp.md
Adds a changeset documenting a patch release for @lynx-js/docs-mcp-server describing the fix to recursively crawl and register nested llms.txt resources.

Docs MCP Server crawler

Layer / File(s) Summary
Crawler signature & wiring
packages/mcp-servers/docs-mcp-server/main.ts
Renames registerResourcescrawlAndRegisterResources, makes it async, adds visited: Set<string> = new Set() param, and updates main to initialize visited and await the crawler.
Traversal control flow
packages/mcp-servers/docs-mcp-server/main.ts
Replaces linkUrls.forEach(...) with a for...of loop over entries to allow await and continue during traversal.
Nested llms.txt handling & recursion
packages/mcp-servers/docs-mcp-server/main.ts
When a link's stripped path ends with llms.txt and is not in visited, fetch the nested index, register it as a lynx-docs://... markdown resource, mark it visited, and recursively call crawlAndRegisterResources; fetch failures are caught and logged and traversal continues.
Guarded resource registration
packages/mcp-servers/docs-mcp-server/main.ts
Non-llms.txt registration now uses an async factory that awaits fetch(link.url), throws on non-OK responses, and returns markdown only on success (replacing prior unconditional res.text() chaining).

Sequence Diagram

sequenceDiagram
    participant Main
    participant Crawler
    participant HTTP
    participant MCP
    Main->>Crawler: start(baseURL, fromMarkdownText, visited)
    Crawler->>HTTP: fetch(link.url)
    HTTP-->>Crawler: response
    alt link ends with llms.txt & not visited
        Crawler->>MCP: register lynx-docs://... (markdown)
        Crawler->>Crawler: mark visited and recurse with nested markdown
    else normal resource
        Crawler->>MCP: guarded register resource (on HTTP OK)
    end
    Crawler-->>Main: finished
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • lynx-family/lynx-stack#1925: Modifies the same packages/mcp-servers/docs-mcp-server/main.ts to introduce async recursive crawling and visited-set handling.

Suggested reviewers

  • colinaaa

Poem

🐰 I hop through links both near and far,

I sniff each llms.txt like a guiding star.
I mark what's new and skip what's been read,
I log a tumble, then bound onward instead.
Recursive trails keep my whiskers led.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: implementing recursive crawling of nested llms.txt resources in the docs-mcp server.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/docs-mcp-recursion

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/mcp-servers/docs-mcp-server/main.ts (1)

122-130: Consider making the resource factory consistent with non-nested resources.

The factory here returns cached nestedMarkdown captured at registration time, while regular resources (lines 157-171) use an async factory that fetches fresh content on each read. This creates behavioral inconsistency: nested index resources return startup-time content, while other resources reflect current server content.

If this caching is intentional (avoiding redundant fetches for stable index files), consider adding a brief comment to document the design decision.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/mcp-servers/docs-mcp-server/main.ts` around lines 122 - 130, The
resource factory currently returns the cached nestedMarkdown captured at
registration (the factory returning () => ({ contents: [{ uri:
`lynx-docs://${strippedUrl}`, text: nestedMarkdown, mimeType: 'text/markdown' }]
})), which is inconsistent with the other resource factories that are async and
fetch fresh content on each read; either change this factory to an async factory
that computes/fetches the current nested markdown on each invocation (e.g.,
async () => ({ contents: [{ uri: `lynx-docs://${strippedUrl}`, text: await
computeNestedMarkdown(...), mimeType: 'text/markdown' }] })) to match the
behavior of the resources at lines 157-171, or if startup caching is
intentional, add a short comment above this factory referencing nestedMarkdown
and explaining that it is intentionally captured at registration to avoid
repeated fetches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/mcp-servers/docs-mcp-server/main.ts`:
- Around line 122-130: The resource factory currently returns the cached
nestedMarkdown captured at registration (the factory returning () => ({
contents: [{ uri: `lynx-docs://${strippedUrl}`, text: nestedMarkdown, mimeType:
'text/markdown' }] })), which is inconsistent with the other resource factories
that are async and fetch fresh content on each read; either change this factory
to an async factory that computes/fetches the current nested markdown on each
invocation (e.g., async () => ({ contents: [{ uri: `lynx-docs://${strippedUrl}`,
text: await computeNestedMarkdown(...), mimeType: 'text/markdown' }] })) to
match the behavior of the resources at lines 157-171, or if startup caching is
intentional, add a short comment above this factory referencing nestedMarkdown
and explaining that it is intentionally captured at registration to avoid
repeated fetches.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e04dd5dd-f09e-45c9-80f1-5621108d9111

📥 Commits

Reviewing files that changed from the base of the PR and between 4daa4d9 and 25e7b16.

📒 Files selected for processing (2)
  • .changeset/fix-recursive-docs-mcp.md
  • packages/mcp-servers/docs-mcp-server/main.ts

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the docs MCP server to recursively discover and register resources referenced by nested llms.txt index files, preventing “Resource not found” errors when documentation is organized under sub-indexes.

Changes:

  • Implement recursive crawling of llms.txt links to register nested indexes and their referenced resources.
  • Add HTTP status handling for nested index fetches and for resource fetches during reads.
  • Add a changeset to publish a patch release for @lynx-js/docs-mcp-server.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
packages/mcp-servers/docs-mcp-server/main.ts Adds recursive crawling/registration of nested llms.txt resources and improves fetch error handling.
.changeset/fix-recursive-docs-mcp.md Patch changeset entry for the docs MCP server.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/mcp-servers/docs-mcp-server/main.ts
Comment thread packages/mcp-servers/docs-mcp-server/main.ts Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 6, 2026

Codecov Report

❌ Patch coverage is 0% with 73 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
packages/mcp-servers/docs-mcp-server/main.ts 0.00% 73 Missing ⚠️

📢 Thoughts on this report? Let us know!

@relativeci
Copy link
Copy Markdown

relativeci Bot commented Mar 6, 2026

Web Explorer

#9662 Bundle Size — 901.27KiB (0%).

7c2e3a9(current) vs 1e1257e main#9648(baseline)

Bundle metrics  Change 2 changes
                 Current
#9662
     Baseline
#9648
No change  Initial JS 45.06KiB 45.06KiB
No change  Initial CSS 2.22KiB 2.22KiB
No change  Cache Invalidation 0% 0%
No change  Chunks 9 9
No change  Assets 11 11
Change  Modules 227(-0.87%) 229
No change  Duplicate Modules 11 11
Change  Duplicate Code 27.25%(+0.04%) 27.24%
No change  Packages 10 10
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#9662
     Baseline
#9648
No change  JS 497KiB 497KiB
No change  Other 402.06KiB 402.06KiB
No change  CSS 2.22KiB 2.22KiB

Bundle analysis reportBranch fix/docs-mcp-recursionProject dashboard


Generated by RelativeCIDocumentationReport issue

@hzy hzy force-pushed the fix/docs-mcp-recursion branch 2 times, most recently from 80a7953 to 7b4ac27 Compare March 9, 2026 10:01
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/mcp-servers/docs-mcp-server/main.ts`:
- Around line 49-54: The crawler in crawlAndRegisterResources and related blocks
resolves nested link.url against the original root baseURL instead of the
current llms.txt location, so relative links like ./foo.md or ../bar/llms.txt
break; fix by resolving each link against the current resource's URL before
using or recursing: compute a resolved URL using new URL(link.url,
currentResourceBase) where currentResourceBase is the URL of the llms.txt (or
the full URL you just fetched/parsed) rather than the passed-in root baseURL,
use that resolved URL for fetching/registering and pass its origin/path (or the
resolved URL) as the base for recursive calls (update occurrences in
crawlAndRegisterResources and the other blocks mentioned: 99-138, 157-171,
222-228).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2b2081ec-7d5c-4d04-8e48-15dd4bbf79a3

📥 Commits

Reviewing files that changed from the base of the PR and between 25e7b16 and 7b4ac27.

📒 Files selected for processing (2)
  • .changeset/fix-recursive-docs-mcp.md
  • packages/mcp-servers/docs-mcp-server/main.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • .changeset/fix-recursive-docs-mcp.md

Comment thread packages/mcp-servers/docs-mcp-server/main.ts
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Mar 9, 2026

Merging this PR will degrade performance by 5.37%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 1 improved benchmark
❌ 1 regressed benchmark
✅ 79 untouched benchmarks
⏩ 26 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
002-hello-reactLynx-destroyBackground 863.5 µs 912.4 µs -5.37%
008-many-use-state-destroyBackground 9.5 ms 8 ms +19.3%

Comparing fix/docs-mcp-recursion (7c2e3a9) with main (460ddbd)

Open in CodSpeed

Footnotes

  1. 26 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@relativeci
Copy link
Copy Markdown

relativeci Bot commented Apr 13, 2026

React MTF Example

#1220 Bundle Size — 206.65KiB (-0.39%).

7c2e3a9(current) vs 1e1257e main#1206(baseline)

Bundle metrics  Change 2 changes
                 Current
#1220
     Baseline
#1206
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
Change  Cache Invalidation 46.38% 46.17%
No change  Chunks 0 0
No change  Assets 3 3
No change  Modules 192 192
No change  Duplicate Modules 77 77
Change  Duplicate Code 44.36%(-0.05%) 44.38%
No change  Packages 2 2
No change  Duplicate Packages 0 0
Bundle size by type  Change 1 change Improvement 1 improvement
                 Current
#1220
     Baseline
#1206
No change  IMG 111.23KiB 111.23KiB
Improvement  Other 95.42KiB (-0.84%) 96.23KiB

Bundle analysis reportBranch fix/docs-mcp-recursionProject dashboard


Generated by RelativeCIDocumentationReport issue

@relativeci
Copy link
Copy Markdown

relativeci Bot commented Apr 13, 2026

React External

#1202 Bundle Size — 690.27KiB (-0.4%).

7c2e3a9(current) vs 1e1257e main#1188(baseline)

Bundle metrics  Change 1 change
                 Current
#1202
     Baseline
#1188
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
Change  Cache Invalidation 40.81% 40.57%
No change  Chunks 0 0
No change  Assets 3 3
No change  Modules 17 17
No change  Duplicate Modules 5 5
No change  Duplicate Code 8.59% 8.59%
No change  Packages 0 0
No change  Duplicate Packages 0 0
Bundle size by type  Change 1 change Improvement 1 improvement
                 Current
#1202
     Baseline
#1188
Improvement  Other 690.27KiB (-0.4%) 693.04KiB

Bundle analysis reportBranch fix/docs-mcp-recursionProject dashboard


Generated by RelativeCIDocumentationReport issue

@relativeci
Copy link
Copy Markdown

relativeci Bot commented Apr 13, 2026

React Example

#8089 Bundle Size — 235.77KiB (-0.31%).

7c2e3a9(current) vs 1e1257e main#8075(baseline)

Bundle metrics  Change 2 changes
                 Current
#8089
     Baseline
#8075
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
Change  Cache Invalidation 38.37% 38.18%
No change  Chunks 0 0
No change  Assets 4 4
No change  Modules 197 197
No change  Duplicate Modules 80 80
Change  Duplicate Code 44.85%(-0.04%) 44.87%
No change  Packages 2 2
No change  Duplicate Packages 0 0
Bundle size by type  Change 1 change Improvement 1 improvement
                 Current
#8089
     Baseline
#8075
No change  IMG 145.76KiB 145.76KiB
Improvement  Other 90.01KiB (-0.82%) 90.75KiB

Bundle analysis reportBranch fix/docs-mcp-recursionProject dashboard


Generated by RelativeCIDocumentationReport issue

@hzy hzy force-pushed the fix/docs-mcp-recursion branch from 6877936 to 7c2e3a9 Compare May 12, 2026 09:05
@relativeci
Copy link
Copy Markdown

relativeci Bot commented May 12, 2026

React Example with Element Template

#355 Bundle Size — 197.79KiB (0%).

7c2e3a9(current) vs 1e1257e main#341(baseline)

Bundle metrics  Change 2 changes
                 Current
#355
     Baseline
#341
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
No change  Cache Invalidation 0% 0%
No change  Chunks 0 0
No change  Assets 4 4
Change  Modules 80(-1.23%) 81
No change  Duplicate Modules 23 23
Change  Duplicate Code 40.31%(+0.05%) 40.29%
No change  Packages 2 2
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#355
     Baseline
#341
No change  IMG 145.76KiB 145.76KiB
No change  Other 52.03KiB 52.03KiB

Bundle analysis reportBranch fix/docs-mcp-recursionProject dashboard


Generated by RelativeCIDocumentationReport issue

@colinaaa colinaaa merged commit 705a3a3 into main May 12, 2026
134 of 147 checks passed
@colinaaa colinaaa deleted the fix/docs-mcp-recursion branch May 12, 2026 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants